Piotr Dziubecki wrote:
> Hi Sergiu,
>
> W dniu 10-11-22 12:58, Sergiu Dumitriu wrote:
>   
>> On 11/22/2010 11:20 AM, Piotr Dziubecki wrote:
>>     
>>> Hi Ricardo,
>>>
>>> W dniu 10-11-19 19:37, Ricardo Rodriguez [eBioTIC.] wrote:
>>>       
>>>> Hi Piotr,
>>>>
>>>> Piotr Dziubecki wrote:
>>>>         
>>>>> Hi,
>>>>>
>>>>> today I've noticed that something bad had happen to some of the 
>>>>> attachments in my XWiki, here is a
>>>>> screenshot from one of the affected pages:
>>>>>
>>>>> http://i.imgur.com/p6Xs7.png
>>>>>
>>>>> Take a look, a couple of attachments have been uploaded but only one is 
>>>>> displayed in the attachment tab.
>>>>> Person who uploaded them claims that yesterday they were ok, but today 
>>>>> somehow they disappeared.
>>>>>
>>>>> It's weird that there is no trace of any operation on them after the 
>>>>> uploading phase.
>>>>>
>>>>> I'm using XWiki Enterprise 2.5.32127 with MySQL data base (Server version 
>>>>> 5.1.47).
>>>>>
>>>>> To add more context, last days my users started to add more attachements 
>>>>> to their pages. Currently the
>>>>> database after the dump is around 200 MB large.
>>>>>
>>>>> Also looked at the logs and found several interesting fragments ( all of 
>>>>> the log snippets are from the time
>>>>> this have been noticed ):
>>>>>
>>>>> 2010-11-18 09:03:09,355
>>>>> [http://apps.man.poznan.pl:28181/xwiki/bin/download/Documents/Proposals/2009AUGURISProposalBPartSubmission.pdf?width=1262]
>>>>> ERROR web.XWikiAction                 - Connection aborted
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> 2010-11-18 13:23:53,118 
>>>>> [http://localhost:28181/xwiki/bin/view/Projects/Opinion+Mining] WARN
>>>>> xwiki.MyPersistentLoginManager  - Login cookie validation hash mismatch! 
>>>>> Cookies have been tampered with
>>>>> 2010-11-18 13:23:53,119 
>>>>> [http://localhost:28181/xwiki/bin/view/Projects/Opinion+Mining] WARN
>>>>> xwiki.MyPersistentLoginManager  - Login cookie validation hash mismatch! 
>>>>> Cookies have been tampered with
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> 2010-11-18 13:57:55,471 [Lucene Index Updater] WARN  
>>>>> lucene.AttachmentData           - error getting content
>>>>> of attachment [2009BEinGRIDwow2greenCONTEXTREVIEW.PPT] for document 
>>>>> [xwiki:Documents.Presentations]
>>>>> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException 
>>>>> from
>>>>> org.apache.tika.parser.microsoft.officepar...@72be25d1
>>>>>             at 
>>>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:138)
>>>>>             at 
>>>>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
>>>>>             at org.apache.tika.Tika.parseToString(Tika.java:267)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:136)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
>>>>>             at 
>>>>> com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
>>>>>             at java.lang.Thread.run(Thread.java:662)
>>>>> Caused by: java.io.IOException: Cannot remove block[ 4209 ]; out of 
>>>>> range[ 0 - 3804 ]
>>>>>             at 
>>>>> org.apache.poi.poifs.storage.BlockListImpl.remove(BlockListImpl.java:98)
>>>>>             at 
>>>>> org.apache.poi.poifs.storage.RawDataBlockList.remove(RawDataBlockList.java:32)
>>>>>             at 
>>>>> org.apache.poi.poifs.storage.BlockAllocationTableReader.<init>(BlockAllocationTableReader.java:99)
>>>>>             at 
>>>>> org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:164)
>>>>>             at 
>>>>> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:74)
>>>>>             at 
>>>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
>>>>>             ... 13 more
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 3999
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 4006
>>>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: 
>>>>> Followed by 4006
>>>>> 2010-11-18 15:05:10,412
>>>>> [http://apps.man.poznan.pl:28181/xwiki/bin/download/Documents/Presentations/2009AUGURISPSNCpresentationduringpreproposalmeetinginSaltzburg.ppt]
>>>>> ERROR web.XWikiAction                 - Connection aborted
>>>>>
>>>>>
>>>>>
>>>>> Unfotunately, today this situation has repeated with other group of  
>>>>> users, the same scenario - after the
>>>>> attachment submission and few edits of the page, they are gone. A snippet 
>>>>> from the log from that period of
>>>>> time ( a lot of that warnings ):
>>>>>
>>>>> 2010-11-19 10:43:37,199 [Lucene Index Updater] WARN  util.PDFStreamEngine 
>>>>>            - java.io.IOException:
>>>>> Error: expected hex character and not  :32
>>>>> java.io.IOException: Error: expected hex character and not  :32
>>>>>             at 
>>>>> org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:316)
>>>>>             at 
>>>>> org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:138)
>>>>>             at 
>>>>> org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:549)
>>>>>             at 
>>>>> org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:383)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:372)
>>>>>             at 
>>>>> org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
>>>>>             at 
>>>>> org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:74)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
>>>>>             at 
>>>>> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
>>>>>             at 
>>>>> org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79)
>>>>>             at 
>>>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
>>>>>             at 
>>>>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
>>>>>             at org.apache.tika.Tika.parseToString(Tika.java:267)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:136)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
>>>>>             at 
>>>>> com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
>>>>>             at java.lang.Thread.run(Thread.java:662)
>>>>>
>>>>>
>>>>> One more from another user:
>>>>>
>>>>> 2010-11-19 10:43:37,464 [Lucene Index Updater] WARN  util.PDFStreamEngine 
>>>>>            - java.io.IOException:
>>>>> Error: expected hex character and not  :32
>>>>> java.io.IOException: Error: expected hex character and not  :32
>>>>>             at 
>>>>> org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:316)
>>>>>             at 
>>>>> org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:138)
>>>>>             at 
>>>>> org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:549)
>>>>>             at 
>>>>> org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:383)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:372)
>>>>>             at 
>>>>> org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:61)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
>>>>>             at 
>>>>> org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:74)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
>>>>>             at 
>>>>> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
>>>>>             at 
>>>>> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
>>>>>             at 
>>>>> org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79)
>>>>>             at 
>>>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
>>>>>             at 
>>>>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
>>>>>             at org.apache.tika.Tika.parseToString(Tika.java:267)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:142)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
>>>>>             at 
>>>>> com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
>>>>>             at java.lang.Thread.run(Thread.java:662)
>>>>> 2010-11-19 11:32:39,900 [Lucene Index Updater] WARN  
>>>>> lucene.AttachmentData           - error getting content
>>>>> of attachment [2008BEinGRIDdesignconceptdiagramdoneinVisio.vsd] for 
>>>>> document [xwiki:Documents.Diagrams]
>>>>> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
>>>>> org.apache.tika.parser.microsoft.officepar...@54ad9fa4
>>>>>             at 
>>>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:134)
>>>>>             at 
>>>>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
>>>>>             at org.apache.tika.Tika.parseToString(Tika.java:267)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:136)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
>>>>>             at 
>>>>> com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
>>>>>             at 
>>>>> com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
>>>>>             at java.lang.Thread.run(Thread.java:662)
>>>>> Caused by: java.lang.IllegalArgumentException: Found a chunk with a 
>>>>> negative length, which isn't allowed
>>>>>             at 
>>>>> org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
>>>>>             at 
>>>>> org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
>>>>>             at 
>>>>> org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
>>>>>             at 
>>>>> org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
>>>>>             at 
>>>>> org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
>>>>>             at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:95)
>>>>>             at 
>>>>> org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:52)
>>>>>             at 
>>>>> org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:49)
>>>>>             at 
>>>>> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:127)
>>>>>             at 
>>>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
>>>>>             ... 13 more
>>>>>
>>>>>
>>>>> I'm counting on your help since I don't know it's more XWiki issue or 
>>>>> maybe I misconfigured something.
>>>>>
>>>>> Regards,
>>>>> Piotr
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@xwiki.org
>>>>> http://lists.xwiki.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>>           
>>>> I think you could be facing two kind of problems: one related with
>>>> memory availability (the one causing attachements to "dissapear") and
>>>> other one related to Lucene and some incompatibilities with Microsoft/
>>>> Microsoft Office files.
>>>>
>>>> Concerning the problem related with memory availability, please, check
>>>> this two links:
>>>>
>>>> http://www.xwiki.org/xwiki/bin/view/FAQ/Howtoincreasethemaximumattachmentsize
>>>> http://www.xwiki.org/xwiki/bin/view/FAQ/HowToSolveAJavaHeapMemoryError
>>>>         
>>> I've already done that - I'm storing attachments 20MB size without any 
>>> errors while uploading.
>>>
>>>       
>>>> I'm not sure if this issus could lead to corrupted attachments or only
>>>> to failures in the process. But I think it is worth to take them into
>>>> account.
>>>>         
>>> What scares me is the fact that even if something went wrong I have no 
>>> visible warning or transaction's
>>> rollback. It's ending in the middle and confuses users.
>>>       
>> If you're using MySQL, then it's a limitation of the default myisam
>> engine, which doesn't have support for transactions. You should switch
>> to innodb.
>>
>> If you're not on myisam, then there's a bug in the storage.
>>     
>
> Thanks for that info, indeed we had that default one enabled. Now we've 
> switched to the innodb and we are 
> monitoring our documents.
>
> I hope that will solve our problem.
>
> Thanks,
> Piotr
>   

Does this add a reason to always use innodb as engine when running XWiki 
with MySQL as database? Thanks!

>   
>>>> There are some recent quite interesting threads in devs list dealing
>>>> with a proposal from Caleb. Just look for attachments in titles there.
>>>> Sorry if I'm repeating this proposal!
>>>>         
>>> Ok will do that.
>>>       
>>>> Concerning Lucene errors. I do need to solve this also here. I've seeing
>>>> also here issues with Lucene and Office files. Do you mind I try here
>>>> with the attachments are causing you problems? Are there quite big?
>>>> Could you send me a couple of them or make than available at any place?
>>>> I can install on Monday recent XE snapshot in my dev box and you could
>>>> upload them there, but I would already try them on my laptop.
>>>>
>>>>         
>>> I need to ask whether I could share that documents with others, if so I'll 
>>> send you some examples.
>>>
>>>       
>>>> Thanks!
>>>>
>>>> Cheers,
>>>>
>>>> Ricardo
>>>>         
> _______________________________________________
> users mailing list
> users@xwiki.org
> http://lists.xwiki.org/mailman/listinfo/users
>
>   

-- 
Ricardo Rodríguez
CTO
eBioTIC.
Life Sciences, Data Modeling and Information Management Systems

_______________________________________________
users mailing list
users@xwiki.org
http://lists.xwiki.org/mailman/listinfo/users

Reply via email to