[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502684#comment-14502684 ] Hudson commented on TIKA-1511: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #634 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/634/]) TIKA-1511, move xerial dependency to 'provided' (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1674800) * /tika/trunk/CHANGES.txt * /tika/trunk/tika-app/src/main/appended-resources/META-INF/LICENSE * /tika/trunk/tika-parsers/pom.xml * /tika/trunk/tika-parsers/src/main/appended-resources/META-INF/LICENSE Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386562#comment-14386562 ] Tim Allison commented on TIKA-1511: --- Thank you, [~thetaphi]. I was aware of about half of that, but I'm very grateful to have the full story from an expert and to know that I won't break Solr. I agree about the benefits of segregating parsers. As Konstantin pointed out, we're trying to head in that direction. Thank you, again! Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386648#comment-14386648 ] Hudson commented on TIKA-1511: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #583 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/583/]) TIKA-1511 include xerial and native libs; some cleanup of README in preparation for 1.8 release (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1670069) * /tika/trunk/CHANGES.txt * /tika/trunk/tika-parsers/pom.xml Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387865#comment-14387865 ] Hudson commented on TIKA-1511: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #589 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/589/]) TIKA-1511: add public domain license notice for Sqlite to main License.txt (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1670239) * /tika/trunk/LICENSE.txt Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385802#comment-14385802 ] Konstantin Gribov commented on TIKA-1511: - +1 for including xerial in tika-app and tika-server. If you want to include it in tika-parsers as non-provided/optional dep, we should have explicit note about presence of native libs in tika-parsers. Than it'll be ok. As I know, Solr 5.0+ is not classic webapp (as were before) but standalone app and shouldn't have such classloading issues, since it's parts aren't redeployed while solr is running. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385803#comment-14385803 ] Uwe Schindler commented on TIKA-1511: - Solr uses ANT + IVY to build. We don't use transitive dependencies at all! So whenever updating TIKA, the person who does this prints the dependency tree and then fills all required information into the ivy.xml file and our ivy-versions.properties file :-) In general, we carefully decide, which dependencies are really needed. Because TIKA automatically disables parser which do not load, we have already removed various files (like netcdf parser - LGPL) or the ASM parser (we dont support indexing Java Class files by default). For the current one: We dont want to have native libraries anywhere (we don't even ship our own native libs for WindowsDirectory). Users need to do this themselves start msvcc/gcc. So we would not ship wth SQLite support by default. In general it would be good to have some easier plugin mechanism to allow Solr to pick only some parsers they ship by default and those the user can download (e.g. by a script). So it would be good to have multiple parser-JARS. So maybe put all crazy parsers that fork processes or call native libs into a separate TIKA parser bundle. The default one should only have pure-java stuff with as few dependencies as possible... Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385836#comment-14385836 ] Konstantin Gribov commented on TIKA-1511: - Idea of better tika-parsers module separation was dicussed some time ago, it's also mentioned in Tika 2.0 roadmap (https://wiki.apache.org/tika/Tika2_0RoadMap). In such case, user would get appropriate {{tika-parsers-*}} modules with their deps (e. g., via {{mvn dependency:copy}} or something similar) and Solr can depend only on {{tika-core}} and minimal {{tika-parsers-*}}. Or with dependency only on {{tika-core}} but it will lead to statndard questions like why it doesn't work as with {{slf4j}} in solr4. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385773#comment-14385773 ] Tim Allison commented on TIKA-1511: --- Any objections to including xerial with app and server rather than provided? We can include instructions for excluding for os not supported or webapps with security/native lib restrictions. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385782#comment-14385782 ] Tim Allison commented on TIKA-1511: --- [~thetaphi], will there be any problems for Solr if we remove provided for xerial in parsers' pom? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320059#comment-14320059 ] Konstantin Gribov commented on TIKA-1511: - With v3 patch forbiddenapis found that {{SQLite3RowReader}} use {{SimpleDateFormat}} without explicit {{locale}} set. I hope, it's enough to use {{Locale.getDefault()}}. Also, fixed {{TestSQLLiteParser}}: it tried to load absent test resourse, seems that it was renamed to {{testSqlite3b.db}}. Tests successfully pass with it. Do we also need {{testSQLITE3.db}} in {{tika-parsers}}? I can't find any test that use this file. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320088#comment-14320088 ] Tim Allison commented on TIKA-1511: --- [~gagravarr], I can't find a mime test that uses testSQLITE3.db in various revs of {{TestMimeTypes.java}} for TIKA-1502. Did you add one at some point? If not, should we remove that file and add a mime test for the sqlite test file that I added for the parser? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320103#comment-14320103 ] Konstantin Gribov commented on TIKA-1511: - [~talli...@mitre.org], r1659547 work fine. Tests for sqlite3 pass. Thanks) Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320104#comment-14320104 ] Konstantin Gribov commented on TIKA-1511: - [~talli...@mitre.org], r1659547 work fine. Tests for sqlite3 pass. Thanks) Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320107#comment-14320107 ] Tim Allison commented on TIKA-1511: --- Great. Thank you. Let me know if we should make any changes in the format of the output or if there are any surprises. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320073#comment-14320073 ] Tim Allison commented on TIKA-1511: --- Oh, wait, those were errors in v3 of my patch attached here. I made several changes from v3 before committing. You shouldn't see the misspelled testSQLLite3b.db in trunk, and I fixed the date format before committing. Let me know if you see these in trunk. I don't. On testSQLITE3.db, that was added for a mime test. I'm looking into r1647473 and its history now to see where that test was/is. On the theory of do no harm, I chose not to remove that or replace it. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320194#comment-14320194 ] Tim Allison commented on TIKA-1511: --- Very cool. Thank you for checking on that. Looks like the issue is only a Windows issue:I get e.g. {{sqlite-3.8.7-2ee1c7aa-2ec8-47ad-bf74-073acc79a850-sqlitejdbc.dll}} each time I run Tika and it hits a sqlite3 file, and they are not deleted. If we'd prefer to include xerial's jar with our bundle to make integration easier (for those not in webapp environments :) ), I'm happy to make the change. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320042#comment-14320042 ] Tim Allison commented on TIKA-1511: --- Mea culpa. Give r1659547 a try. What would be the benefit of optional vs supplied? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320063#comment-14320063 ] Tim Allison commented on TIKA-1511: --- Will fix now. No idea how my tests passed with those errors...Thank you. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320183#comment-14320183 ] Konstantin Gribov commented on TIKA-1511: - I don't see a lot of {{/tmp/sqlite-*.so}} files, only one while db is open. After closing connections/db it is removed automagically. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320354#comment-14320354 ] Tim Allison commented on TIKA-1511: --- Would anyone be able to offer help on this one? Are permissions issues preventing xerial's wrapper from writing the .so files to Jenkins' temp folder? {noformat} Error Message org.sqlite.core.NativeDB._open(Ljava/lang/String;I)V Stacktrace java.lang.UnsatisfiedLinkError: org.sqlite.core.NativeDB._open(Ljava/lang/String;I)V at org.sqlite.core.NativeDB._open(Native Method) at org.sqlite.core.DB.open(DB.java:161) at org.sqlite.core.CoreConnection.open(CoreConnection.java:145) at org.sqlite.core.CoreConnection.init(CoreConnection.java:66) at org.sqlite.jdbc3.JDBC3Connection.init(JDBC3Connection.java:21) at org.sqlite.jdbc4.JDBC4Connection.init(JDBC4Connection.java:23) at org.sqlite.SQLiteConnection.init(SQLiteConnection.java:45) at org.sqlite.JDBC.createConnection(JDBC.java:114) at org.sqlite.SQLiteConfig.createConnection(SQLiteConfig.java:101) {noformat} Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320360#comment-14320360 ] Tim Allison commented on TIKA-1511: --- Perhaps revert to 3.8.6 according to [this|https://bitbucket.org/xerial/sqlite-jdbc/issue/152/387-version-linux-issue]? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320371#comment-14320371 ] Tim Allison commented on TIKA-1511: --- reverted to 3.8.6 in r1659598. If anyone has an ubuntu machine and wants to try reverting until we have success, that would be better than me trying through Hudson. :) Let's see if 3.8.6 is the charm. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320307#comment-14320307 ] Hudson commented on TIKA-1511: -- UNSTABLE: Integrated in tika-trunk-jdk1.7 #489 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/489/]) TIKA-1511, third time is the charm...many apologies (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1659547) * /tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/Database.java TIKA-1511, with new files added...doh (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1659545) * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc/AbstractDBParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc/JDBCTableReader.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc/SQLite3DBParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc/SQLite3Parser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc/SQLite3TableReader.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/jdbc * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/jdbc/SQLite3ParserTest.java Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320678#comment-14320678 ] Tim Allison commented on TIKA-1511: --- Reverting to an earlier version of sqlite-jdbc worked, but I find it unsettling. Do we want to include this parser as part of the standard distro or should we offer it as a third party parser? The licenses are good, but dependencies on native libs give me some concern...especially after that experience. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320617#comment-14320617 ] Hudson commented on TIKA-1511: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #490 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/490/]) TIKA-1511 try to revert to earlier version of sqlite-jdbc to avoid unsatisfiedlikeerror on ubuntu (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1659598) * /tika/trunk/tika-parsers/pom.xml Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320797#comment-14320797 ] Luis Filipe Nassif commented on TIKA-1511: -- As There are native libs only for Windows, Linux and MacOs X, maybe adding a check for them into getSupportedTypes could make the parser more robust? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, TIKA-1511v3bis.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320022#comment-14320022 ] Konstantin Gribov commented on TIKA-1511: - [~talli...@mitre.org], you can also make it {{optionaltrue/optional}} instead of {{provided}}. Also, I can't find parser itself ({{org.apache.tika.parser.jdbc.SQLite3Parser}})in trunk rev 1659449. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319457#comment-14319457 ] Hudson commented on TIKA-1511: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #487 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/487/]) TIKA-1511 add parser for sqlite3 (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1659449) * /tika/trunk/CHANGES.txt * /tika/trunk/tika-bundle/pom.xml * /tika/trunk/tika-parsers/pom.xml * /tika/trunk/tika-parsers/src/main/appended-resources/META-INF/LICENSE * /tika/trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser * /tika/trunk/tika-parsers/src/test/resources/test-documents/testSqlite3b.db Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318232#comment-14318232 ] Tim Allison commented on TIKA-1511: --- Bottom line: it will be simpler to treat the full db with all tables as one big file. We can still treat clobs and blobs as embedded documents. Details: When I tried to cut out the {{JDBCInputStream}} and just send in a zero byte {{InputStream}}, regular parsing worked properly. However, if a user tries to use a {{ParserContainerExtractor}}, that fails to reach the BLOBs because of this: {code} MediaType type = detector.detect(tis, metadata); if (extractor == null) { // Let the handler process the embedded resource handler.handle(filename, type, tis); } else { // Use a temporary file to process the stream twice File file = tis.getFile(); // Let the handler process the embedded resource InputStream input = TikaInputStream.get(file); try { handler.handle(filename, type, input); } finally { input.close(); } // Recurse extractor.extract(tis, extractor, handler); } {code} When the extractor is called below the {{//Recurse}} comment, it only sees the zero-byte {{TikaInputStream}}. It does not see the {{type}} or the {{metadata}}. So, in the case of {{AutoDetectParser}}, it only sees a zero byte {{InputStream}} and therefore detects it as {{application/octet-stream}}. In short, there is no current way to pass the detected type through to the extractor. We could, of course, add a parameter for {{type}} or {{metadata}} to the ParserContainerExtractor's {{extract}} signature... Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298791#comment-14298791 ] Luis Filipe Nassif commented on TIKA-1511: -- Hi [~talli...@apache.org], I am ok to remove the virtual csv/html inputStream (there is no embedded table stream as you pointed before), but I think it is strange an inputStream that can not be read. Maybe back off to the big doc approach... What are the advantages of handling each table like an embedded doc? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299000#comment-14299000 ] Tim Allison commented on TIKA-1511: --- From a search perspective, the search experience is typically better with smaller documents than with enormous docs. As for the oddity, y, I agree, but we do it in AbstractPOIFSExtractor. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299171#comment-14299171 ] Luis Filipe Nassif commented on TIKA-1511: -- For search could we split the big xhtml output with a contentHandlerDecorator? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299175#comment-14299175 ] Tim Allison commented on TIKA-1511: --- We could...I'm more inclined to go with the RecursiveParserWrapper, but parsing should work. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298670#comment-14298670 ] Tim Allison commented on TIKA-1511: --- Thank you, Nick, for reviewing this! I'll fix the wildcards...not sure how those crept in and the assertContains... I'm not happy with the added complexity of the JDBCInputStream. Bottom line: should we get rid of that option and back off to a zero-byte InputStream and grabbing the table object from the OpenContainer? That would simplify quite a bit, including detection... And, it would make this parser behave like the PST parser...I think. If we really want to add it later, we can, but simpler is better... [~lfcnassif], would you be ok with that proposal? As for another jdbc-based format, I completely agree. Can you recommend another single-file db format? Access comes to mind, but I can't find a pure Java parser that has jdbc: Jackcess (LGPL) has its own api and doesn't support jdbc. I looked briefly at derby, hsqldb, mysql, and they all seem to rely on a directory of files...I very well could have missed a single file option for those, though... Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298702#comment-14298702 ] Tim Allison commented on TIKA-1511: --- h2 appears to be MPL _or_ EPL. According to [apache legal faq|http://www.apache.org/legal/resolved.html], MPL 2.0 is good as long as we include the license info and the disclaimer. So, h2 should work, no? Other candidates? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298416#comment-14298416 ] Nick Burch commented on TIKA-1511: -- Few minor things on Tim's github branch for this - I'm seeing some wildcard imports being added, and some assertContains being replaced with assertTrue(str.contains) - the latter doesn't give as helpful an exception for the assert failing. Does the branch need updating, or are there spurious changes that've come in? I've had a quick look at the diff to the branch, but not a full one. My initial impression is that there was more logic than I'd expected in JDBCResultSetInputStream and JDBCRowReader, but necessarily a problematic amount. I'm still not entirely sure of the idea that depending on how you access the embedded stream, you get different behaviour. If you have a Word document embedded in a PDF, the embedded stream doesn't say I'll give you Word if you ask one way, Plain Text if you ask another, it just says here's the content type, you'll need to find a suitable parser or fail trying For the specific use case of something that iterates through a file, dumping out all embedded resources without parsing them, if we do support it for these JDBC tables (I'm tempted to say for that use case we don't return anything for the table), we could just have a special case wrapper which parses to HTML as normal and returns that, rather than messing around with maybe html via jdbc, maybe magically csv Also, it'd be good if we could have implementations for 2 different jdbc-based formats if we can. That should help us verify we've got the split between abstract jdbc and sqlite parts correct! Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294402#comment-14294402 ] Luis Filipe Nassif commented on TIKA-1511: -- Hum... RecursiveParserWrapper is very cool! I did not have a chance to look at it before, thank you. Currently I am doing something similar with a custom EmbeddedDocumentExtractor. For sure RecursiveParserWrapper can help with that use case! Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291829#comment-14291829 ] Tim Allison commented on TIKA-1511: --- The RecursiveParserWrapper should allow, that, no? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291148#comment-14291148 ] Luis Filipe Nassif commented on TIKA-1511: -- My specific use case is to produce a single xHtml file for each table that can be displayed to user. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290303#comment-14290303 ] Tim Allison commented on TIKA-1511: --- I'm not sure I understand the need for that. Won't you be able to send in whatever handler you want via the regular call to parse and by attaching a ParsingEmbeddedDocumentExtractor? What, exactly, do you want to have when Tika has finished processing the Sqlite file? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285621#comment-14285621 ] Luis Filipe Nassif commented on TIKA-1511: -- No problems, the desing looks good! Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285568#comment-14285568 ] Tim Allison commented on TIKA-1511: --- {quote} A) I think it will work, as the patch works now. But I think an inputStream that can not be read is a bit strange. {quote} Agreed. The new proposal is to make the InputStream readable, but the regular use case of an AutoDetectParser sent in via ParseContext won't bother to read the InputStream, rather, it will read the table object and use the user-supplied ContentHandler. {quote} B) Could it be better to send a xHTML inputStream with markup to client instead of simple UTF-8 encoded CSV? {quote} We could, but there are other ways of getting that...RecursiveParserWrapper or custom recursive embedded parser handler or even just sending in the plain AutoDetectParser as the EmbeddedDocumentExtractor/Parser in ParseContext. The idea behind this is to support a ParserContainerExtractor that would normally pull just the bytes from embedded documents...because there are no bytes for a table object (i.e. it never exists as an actual standalone file), I propose a csv proxy. {quote} C) I agree, but it will work only if he adds the correct parser (eg TableParser or CompositeParser) to ParseContext, right? {quote} The user will have to add an AutoDetectParser to the ParseContext, and we will need to add org.apache.tika.parser.jdbc.SQLite3Parser org.apache.tika.parser.jdbc.JDBCTableParser to the parser services file. I have a draft of this proposal working. The current downside is that if the client resets and rereads the InputStream, the blobs/clobs are processed twice via the EmbeddedDocumentExtractor. Any problems with the above? Recommendations for an alternate design? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283883#comment-14283883 ] Tim Allison commented on TIKA-1511: --- Hi [~lfcnassif], Based on your point about the tika-app's -z option and its FileEmbeddedDocumentExtractor that just copies bytes from the InputStream to a file, I propose the following. I have a strong preference to treat each table as an embedded file, but if it isn't possible, it isn't possible. So, the proposal for making use of classes that implement EmbeddedDocumentExtractor for each table: A) If the EmbeddedDocumentExtractor is a parsing EmbeddedDocumentExtractor, the correct parser will be called, and it will grab a JDBC object from the a wrapper/modification of TikaInputStream...it will not actually read the InputStream at all. The output will go into whatever handler is passed in. B) If a client reads the bytes from the input stream, they'll get a UTF-8 encoded CSV InputStream, without BLOBs and CLOBs...the EmbeddedDocumentExtractor will be called for each individual BLOB and CLOB. C) If a client uses the basic pattern of adding a Parser to the ParseContext, they'll get one big file with markup for the different div. D) If a client uses the RecursiveParserWrapper (not recommended for large dbs!), there will be one metadata object for each table, and one metadata object for each BLOB and CLOB...in short, potentially a large number of embedded documents. I'll mock up this plan and attach a patch if this sounds reasonable. If this does work out, we might consider refactoring the PSTParser to treat individual emails in a similar way. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283988#comment-14283988 ] Luis Filipe Nassif commented on TIKA-1511: -- Hi [~talli...@apache.org]. First, you're doing a great job, thank you. I only want to help with some ideas, because I will not have time in near future to help with the patch. A) I think it will work, as the patch works now. But I think an inputStream that can not be read is a bit strange. B) Could it be better to send a xHTML inputStream with markup to client instead of simple UTF-8 encoded CSV? C) I agree, but it will work only if he adds the correct parser (eg TableParser or CompositeParser) to ParseContext, right? D) I agree, that would be great. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280433#comment-14280433 ] Tim Allison commented on TIKA-1511: --- H... This will fail if someone sends in a custom EmbeddedDocumentExtractor because there is no way to pass the StatementTablePair to that interface via ParseContext. Some options: 1) We could go back to treating the db as one big doc, as we do with xls, but I think I'd prefer to treat each table as a separate doc. 2) We could get rid of the StatementTablePair hack, extract the text from each table into a String and then pass that into EmbeddedDocumentExtractor as the InputStream. The drawback to this is that we'd ignore the handler and lose potential tr td markup Any ideas on this? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280553#comment-14280553 ] Luis Filipe Nassif commented on TIKA-1511: -- I think it will fail if someone sends in a custom EmbeddedDocExtractor (EDE) because it will probably try to read from the empty ByteArrayInputStream to get the table. The StatementTablePair wil be there but could not be searched for into parseContext. 1) I prefer to handle each table as an embedded doc too, if it is possible. If not, lets go back. 2) Is it possible to generate a HTML representation of the tables and pass it into EDE? By default could it be handled by HtmlParser? Does HtmlParser currently extract embedded docs, like images? Can we insert the BLOBs into that HTML so that the HtmlParser will extract those BLOBs? If this approach is possible, we can use pipedWriter and pipedReader to not hold the entire HTML/Tables in memory, possibly huge ones. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280829#comment-14280829 ] Tim Allison commented on TIKA-1511: --- Y, the HTML representation is generated by wrapping the handler in an XHTMLHandler as other parsers do, and in v2 of the patch, this actually works. No need to get HtmlParser involved. If you want plain text, use a BodyContentHandler. I may be missing your point on HtmlParser and PipedReader/Writer. I added two tests that just print out the output from standard AutoDetectParser and from a RecursiveParserWrapper that wraps AutoDetect...let me know what you think. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281086#comment-14281086 ] Luis Filipe Nassif commented on TIKA-1511: -- If the inputStream (pseudoInputStream) received by EmbeddedDocExtractor can not be read, I think using EDE is not useful. How will this approach work with TikaCli --extract option? My original idea was to support an use case like TikaCli --extract... Now I think this extraction of tables to files can be done handling the db as one big doc and using a ContentHandlerDecorator that will split the xhtml output at table bondaries. Each xhtml segment can be converted to a byte[] (if small) and then to a ByteArrayInputStream that can be passed to a EmbeddedDocDecorator, if set on parseContext. If not set the ContentHandlerDecorator do not need to split tables and can fallBack to default behavior. A custom EDE can then extract tables to files if desired. So now I think we could go with the big doc approah. What do you think? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279748#comment-14279748 ] Tim Allison commented on TIKA-1511: --- Sounds good, y, I think the user will have to handcraft depth handling for now. Question for the community... To call the EmbeddedDocumentExtractor for each table, I can't just pass it an InputStream -- there is no InputStream, just a Connection and a table name against which to run the select * from tablename. One solution would be to create a special mime-type, tika-internal/jdbc-table, and then a JDBCTableParser that supports that mime-type, but pulls a ConnectionTableNamePair (or something?) from the ParseContext. Other ideas? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277044#comment-14277044 ] Luis Filipe Nassif commented on TIKA-1511: -- 1) I vote to handle each table as a separate/embedded item with EmbeddedDocumentExtractor. If the user do not set a EmbeddedDocumentExtractor into ParseContext, the parser should fallback to ParsingEmbeddedDocumentExtractor that will simply append all tables with div. So the parser will be more flexible. 2) I think the same can be applied here. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277080#comment-14277080 ] Konstantin Gribov commented on TIKA-1511: - [~talli...@mitre.org], working with tables as separate files looks good. Maybe, also migrate excel parsing to same behavior. Having consistent behavior is good from less surprise principle point. Treating BLOBs as embedded document gives library user ability to configure it's detection, parsing and extration via {{ParserContext}}, AFAIK. E. g. Tika user can just detect MIME-type (and, maybe, metadata) when parsing database table. But this lead to one issue, user may want different behavior for different levels of embedded document, e.g. parse first level (table) and only extract metadata for second (blob in some field). For me it'll be a real case in some projects. In such case user may want to pass some {{ParserContext}} or factory for it to {{EmbeddedDocumentExtractor}}. So, such improvement can be done after. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275605#comment-14275605 ] Luis Filipe Nassif commented on TIKA-1511: -- I think the jdbc based AbstractClass is a great route! Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275703#comment-14275703 ] Konstantin Gribov commented on TIKA-1511: - [~lfcnassif], +1. IMHO, ManifoldCF connectors are quite heavy dependency. {{tika-app.jar}} is about 30MiB now. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275217#comment-14275217 ] Tim Allison commented on TIKA-1511: --- Thank you, [~grossws]! Two questions: 1) On how to exclude the native libs...is it ok to require that people re-bundle, that is just get rid of the dependency in the pom and build from scratch? Is there a cleaner method? 2) Would it be better to require users who want SQLLite3 parsing to add xerial to their classpath? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275286#comment-14275286 ] Konstantin Gribov commented on TIKA-1511: - Usual way is to exclude maven dependency and add check some {{xerial}} class presence before using it in appropriate Tika parser (i. e. call {{Class.forName(org.sqlite.JDBC)}} and catch {{ClassNotFoundException}}). I don't know how consistently {{tika-parsers}} uses this approach. Native libs are usually stored in same jar (build for all supported platforms), so excluding {{sqlite-jdbc.jar}} prevents loading sqlite native library from it. E.g. if I don't need, say, netcdf parsers when invoking tika I can add such snippet to my {{pom.xml}}: {code:xml} dependency groupIdorg.apache.tika/groupId artifactIdtika-parsers/artifactId version1.6/version exclusions exclusion groupIdedu.ucar/groupId artifactIdnetcdf/artifactId /exclusion /exclusions /dependency {code} So, tika library user don't need to rebuild tika-parsers, store it somewhere and can use prebuild tika release from maven central. Same pattern can be used with other libs, splitting them into two buckets: - with Apache-compatible license, which can be included in {{tika-parsers.jar}} artifact, - with license which prevents packaging it with Tika and documentation info about such parsers/detectors availability if user add them to classpath. Such approach is generic and not related to libs with jni. E.g. it allows someone to use proprietary or copyleft (GNU GPL/LGPL) library if it's allowed from legal side. I'm not a lawyer, so I don't know will compile-time dependency on some library with Apache-incompatible license infringe someones copyright or not. Disclaimer: I'm not a lawyer. My thoughts above aren't legal advice. I think, legal advice from ASF should be formally received before including ever optional dependencies on some Apache License incompatible thrid-party libs. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275182#comment-14275182 ] Konstantin Gribov commented on TIKA-1511: - JNI can potentially give some issues in webapp container/appserver and environments with security manager turned on. I think it should be at least mentioned in docs if we use native libs in Tika and documented how to exclude them. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275112#comment-14275112 ] Tim Allison commented on TIKA-1511: --- Thank you for looking into that. I like the bundling of native libs so that users shouldn't have to worry. Do you see any potential problems from a technical standpoint with xerial's wrapper/jar? [~gagravarr], [this|http://bitbucket.org/xerial/sqlite-jdbc] looks good to me. Do you still recommend checking with Legal? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275361#comment-14275361 ] Tim Allison commented on TIKA-1511: --- Completely agree with this...that was the plan, esp for those that are explicitly not Apache. {quote} (i. e. call Class.forName(org.sqlite.JDBC) and catch ClassNotFoundException) {quote} On ucar, got it, I'll follow that model (the excludes statements in app, server and bundle poms) for SQLite if we get a negative decision from LEGAL and for any other db drivers/native code that are explicitly not Apache. Thank you, again! Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275485#comment-14275485 ] Luis Filipe Nassif commented on TIKA-1511: -- Another library option is https://code.google.com/p/sqlite4java/ It is not a jdbc driver, but also depends on native libs. Maybe a jdbc driver like xerial would be better because we can be database independent and reuse code to other formats (dbf, mdb...)? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275508#comment-14275508 ] Nick Burch commented on TIKA-1511: -- If we're going to do a general jdbc option, maybe we'd be better off having an optional module that just wraps Apache ManifoldCF? ManifoldCF provides connectors / extractors for JDBC amongst others Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274159#comment-14274159 ] Nick Burch commented on TIKA-1511: -- Just to be sure, since SQLite doesn't show up in the [Apache Legal FAQ list|http://www.apache.org/legal/resolved.html], it'd probably be worth raising a legal jira (link from [the legal page|http://www.apache.org/legal/resolved.html) just to get confirmation that it's fine to use + clarify what (if any) notice entry is needed for it Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274135#comment-14274135 ] Tim Allison commented on TIKA-1511: --- Agreed on the license. I'm able to create and write to a sqlite db with just the jar from maven: {noformat} dependency groupIdorg.xerial/groupId artifactIdsqlite-jdbc/artifactId version3.8.7/version /dependency {noformat} I don't think I have native libs kicking around my system somewhere, or do I? This will add another 4 MB to tika-app/tika-server, but I think that it is worth it... Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Priority: Minor I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274612#comment-14274612 ] Luis Filipe Nassif commented on TIKA-1511: -- Yes, there are native libs for windows, mac and linux packed into xerial sqlite-jdbc-3.8.7.jar, but there are other wrappers if that is a problem. The license for xerial-jdbc is Apache v2. Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273869#comment-14273869 ] Tim Allison commented on TIKA-1511: --- See any licensing problems with bundling sqlite dependency? It isn't Apache v2, but what we'd bundle isn't licensed at all ([link|https://www.sqlite.org/copyright.html]). I don't see a problem, but wanted to check to see if anyone has any issues. Thank you for opening this issue! Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Priority: Minor I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273905#comment-14273905 ] Luis Filipe Nassif commented on TIKA-1511: -- I don't see any problems too. I think public domain is more liberal than apache v2, because the authors abdicated their copyright. But sqlite needs native libs. Could it be a poblem? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Priority: Minor I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)