Re: A problem in the right-to-left languages

2011-11-06 Thread Ahmad Ajiloo
Hi Did your probe conclude a result? On Wed, Nov 2, 2011 at 4:40 AM, Ken Krugler wrote: > I know some of the original team members - I could ask. > > Are there specific questions, or just "is anybody still minding the fire"? > > -- Ken > > On Nov 1, 2011, at 2:43pm, Nick Burch wrote: > > > On Tue

Re: location of pdfbox in sources of Tika

2011-11-01 Thread Ahmad Ajiloo
apache.pdfbox > pdfbox > 1.5.0 > > > Change also a version tag to the appropriate number. Then, go to > ../tika-site (top level directory of tika project) and rerun mvn clean > install. > > If all were right you will have a new tika . > > Hope it helps, > >

Re: A problem in the right-to-left languages

2011-11-01 Thread Ahmad Ajiloo
jar file? On Mon, Oct 31, 2011 at 10:49 PM, Robert Muir wrote: > Do you have ICU4J jar in your classpath in both situations? > > On Mon, Oct 31, 2011 at 1:35 PM, ahmad ajiloo > wrote: > > Hello > > When I use Tika for extracting my persian pdf files, all the characters

location of pdfbox in sources of Tika

2011-10-31 Thread ahmad ajiloo
Hello I have an edited file in pdfbox project and want to rebuild Tika with this new file. But i can't find location of pdfbox sources in Tika sources to change that. can anyone help me? thanks

A problem in the right-to-left languages

2011-10-31 Thread ahmad ajiloo
Hello When I use Tika for extracting my persian pdf files, all the characters will be extracted vice versa. I mean that the characters showed from beginning of the line to the end, but from left to right. However when I use Tika gui via Nutch there is no mistake and the output text is right-to-left

[jira] [Updated] (TIKA-713) Tika can not parse all of the persian pdf files

2011-10-31 Thread Ahmad Ajiloo (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmad Ajiloo updated TIKA-713: -- Attachment: Simple3.pdf Complex.pdf I attached this two files for more researching

[jira] [Commented] (TIKA-713) Tika can not parse all of the persian pdf files

2011-10-31 Thread Ahmad Ajiloo (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140134#comment-13140134 ] Ahmad Ajiloo commented on TIKA-713: --- I'm testing new Encoding.java file w

[jira] [Updated] (TIKA-713) Tika can not parse all of the persian pdf files

2011-10-31 Thread Ahmad Ajiloo (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmad Ajiloo updated TIKA-713: -- Attachment: Simple2.pdf > Tika can not parse all of the persian pdf fi

[jira] [Commented] (TIKA-713) Tika can not parse all of the persian pdf files

2011-10-05 Thread Ahmad Ajiloo (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121376#comment-13121376 ] Ahmad Ajiloo commented on TIKA-713: --- Thanks a lot > Tika can no

[jira] [Updated] (TIKA-713) Tika can not parse all of the persian pdf files

2011-09-12 Thread Ahmad Ajiloo (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmad Ajiloo updated TIKA-713: -- Attachment: ebrat.pdf this is a persian pdf file that Tika can't parse it. > Tika can not pars

[jira] [Created] (TIKA-713) Tika can not parse all of the persian pdf files

2011-09-12 Thread Ahmad Ajiloo (JIRA)
Versions: 0.9 Reporter: Ahmad Ajiloo Fix For: 0.9 Hello I used Tika (of course in Nutch) to parse some persian pdf files. some of the files clearly transformed to a plain text. but about some of them, output was corrupted. I used ICU4J v4 library and the text changed to right