[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070693#comment-16070693 ] Wendy Tao commented on SOLR-7925: - Hi Jan, Thank you for keeping posted with new update. I finally switched to index database and gave up on indexing gzip files. Thanks! > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070624#comment-16070624 ] Jan Høydahl commented on SOLR-7925: --- See SOLR-10981 which probably has a smoother solution using {{Content-Encoding: gzip}} header > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291355#comment-15291355 ] Wendy Tao commented on SOLR-7925: - Hi Song Hyonwoo, Do you have sample .xml.gz files that I can try? I tried this patch for .xml.gz file. I exported your classes into a .jar file and placed under the following directory: /opt/solr-5.3.0/server/solr-webapp/webapp/WEB-INF/lib But for some reason, it didn't index data. Here is the command and response. Thanks! $ curl "http://localhost:8983/solr/rcsb/update/compress/gzip?update.contentType=application/xml=true; -H 'Content-Type: application/gzip' --data-binary @1hhq-noatom.xml.gz 0106 > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287001#comment-15287001 ] Wendy Tao commented on SOLR-7925: - Hi Song, I am interested in applying SOLR-7925.patch to solr 5.3 for indexing .xml.gz file. Could you let me know which solr project or solr package or solr .jar file I should apply the patch to ? Thanks! --Wendy > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121725#comment-15121725 ] Jan Høydahl commented on SOLR-7925: --- Anyone know if it is easy to configure Jetty to automatically deflate a gzip stream, before it even hits Solr? > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121815#comment-15121815 ] Uwe Schindler commented on SOLR-7925: - Unfortunately, the official support for gzip/deflate "Content-Encoding" (not to be confused with Content Type), only allows to compress responses: https://www.eclipse.org/jetty/documentation/current/gzip-filter.html The HTTP standard does not have an official way that the client can send compressed content (as far as I know). The reason is that the server cannot announce this possibility before the client sends data. When serving responses, client sends Accept-Encoding header containing the supported compression formats and server responds with one from this list (after finding the intersection of his capabilities with clients request). This is different with HTTP 2.0, where there is compression part of the game (also when sending the HTTP headers). > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122333#comment-15122333 ] Jan Høydahl commented on SOLR-7925: --- Thanks for clarifying Uwe! Could not find any reference to request body compression in the HTTP 2.0 spec, only request headers... However, httpd's mod_deflate also provides a filter to decompress compressed requests: "The mod_deflate module also provides a filter for decompressing a gzip compressed request body . In order to activate this feature you have to insert the DEFLATE filter into the input filter chain", see https://httpd.apache.org/docs/2.4/mod/mod_deflate.html. Guess that's why I started looking for a Jetty filter or plugin doing the same. The Solr client could then post the request using Content-Encoding: gzip > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791771#comment-14791771 ] Song Hyonwoo commented on SOLR-7925: Thanks for your comment. > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791768#comment-14791768 ] Song Hyonwoo commented on SOLR-7925: For simplicity, maybe it is better that Clients handle this kind of functionality, but benefit of indexing with gzipped file is saving network resource for transferring data to remote Solr. As our test, indexing with gzipped file is 25% faster than original file on limited network bandwidth. > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803012#comment-14803012 ] Jan Høydahl commented on SOLR-7925: --- It could also be possible to add a Servlet Filter to Jetty which handles the decompression generically if the correct HTTP header is set on the request... > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14768899#comment-14768899 ] Jan Høydahl commented on SOLR-7925: --- Not sure if I agree that this belongs in Solr. Clients should handle streaming from various sources, including compressed files... > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746241#comment-14746241 ] Chris Eldredge commented on SOLR-7925: -- Sounds potentially very useful when posting large amount of data to Solr. > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file
[ https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738274#comment-14738274 ] Song Hyonwoo commented on SOLR-7925: This patch will help to save network bandwidth when you update file to remote solr server. If you need to update big file frequently to remote solr, you can update the file as gzipped format with this patch. If your system's network traffic is quite busy this patch is useful to save network bandwidth. You can test it like this. $ cd solr/core $ ant test -Dtestcase=GZipCompressedUpdateRequestHandlerTest Thanks. > Implement indexing from gzip format file > > > Key: SOLR-7925 > URL: https://issues.apache.org/jira/browse/SOLR-7925 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 5.2.1 >Reporter: Song Hyonwoo >Priority: Minor > Labels: patch > Attachments: SOLR-7925.patch > > > This will support the update of gzipped format file of Json, Xml and CSV. > The request path will use "update/compress/gzip" instead of "update" with > "update.contentType" parameter and "Content-Type: application/gzip" as > Header field. > The following is sample request using curl command. (use not --data but > --data-binary) > curl > "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true; > -H 'Content-Type: application/gzip' --data-binary @data.json.gz > To activate this function need to add following request handler information > to solrconfig.xml >class="org.apache.solr.handler.CompressedUpdateRequestHandler"> > > application/gzip > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org