[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2017-06-30 Thread Wendy Tao (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070693#comment-16070693
 ] 

Wendy Tao commented on SOLR-7925:
-

Hi Jan,

Thank you for keeping posted with new update. I finally switched to index 
database and gave up on indexing gzip files.  Thanks! 

> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2017-06-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070624#comment-16070624
 ] 

Jan Høydahl commented on SOLR-7925:
---

See SOLR-10981 which probably has a smoother solution using {{Content-Encoding: 
gzip}} header

> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2016-05-19 Thread Wendy Tao (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291355#comment-15291355
 ] 

Wendy Tao commented on SOLR-7925:
-

Hi Song Hyonwoo,

Do you have  sample .xml.gz files that I can try? 

I tried this patch for .xml.gz file. I exported your classes into a .jar file 
and placed under the following directory:
/opt/solr-5.3.0/server/solr-webapp/webapp/WEB-INF/lib

But for some reason, it didn't index data. Here is the command and response.  
Thanks!

$ curl 
"http://localhost:8983/solr/rcsb/update/compress/gzip?update.contentType=application/xml=true;
 -H 'Content-Type: application/gzip' --data-binary @1hhq-noatom.xml.gz



0106



> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2016-05-17 Thread Wendy Tao (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287001#comment-15287001
 ] 

Wendy Tao commented on SOLR-7925:
-

Hi Song,

I am interested in applying SOLR-7925.patch to solr 5.3 for indexing .xml.gz 
file. Could you let me know which solr project or solr package or solr .jar 
file I should apply the patch to ?  Thanks! --Wendy


> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2016-01-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121725#comment-15121725
 ] 

Jan Høydahl commented on SOLR-7925:
---

Anyone know if it is easy to configure Jetty to automatically deflate a gzip 
stream, before it even hits Solr?

> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2016-01-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121815#comment-15121815
 ] 

Uwe Schindler commented on SOLR-7925:
-

Unfortunately, the official support for gzip/deflate "Content-Encoding" (not to 
be confused with Content Type), only allows to compress responses: 
https://www.eclipse.org/jetty/documentation/current/gzip-filter.html

The HTTP standard does not have an official way that the client can send 
compressed content (as far as I know). The reason is that the server cannot 
announce this possibility before the client sends data. When serving responses, 
client sends Accept-Encoding header containing the supported compression 
formats and server responds with one from this list (after finding the 
intersection of  his capabilities with clients request).

This is different with HTTP 2.0, where there is compression part of the game 
(also when sending the HTTP headers).

> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2016-01-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122333#comment-15122333
 ] 

Jan Høydahl commented on SOLR-7925:
---

Thanks for clarifying Uwe!

Could not find any reference to request body compression in the HTTP 2.0 spec, 
only request headers...
However, httpd's mod_deflate also provides a filter to decompress compressed 
requests: "The mod_deflate module also provides a filter for decompressing a 
gzip compressed request body . In order to activate this feature you have to 
insert the DEFLATE filter into the input filter chain", see 
https://httpd.apache.org/docs/2.4/mod/mod_deflate.html. 

Guess that's why I started looking for a Jetty filter or plugin doing the same. 
The Solr client could then post the request using Content-Encoding: gzip

> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2015-09-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791771#comment-14791771
 ] 

Song Hyonwoo commented on SOLR-7925:


Thanks for your comment.

> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2015-09-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791768#comment-14791768
 ] 

Song Hyonwoo commented on SOLR-7925:


For simplicity, maybe it is better that Clients handle this kind of 
functionality, but benefit of indexing with gzipped file is saving network 
resource for transferring data to remote Solr.

As our test, indexing with gzipped file is 25% faster than original file on 
limited network bandwidth.

> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2015-09-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803012#comment-14803012
 ] 

Jan Høydahl commented on SOLR-7925:
---

It could also be possible to add a Servlet Filter to Jetty which handles the 
decompression generically if the correct HTTP header is set on the request...

> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2015-09-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14768899#comment-14768899
 ] 

Jan Høydahl commented on SOLR-7925:
---

Not sure if I agree that this belongs in Solr. Clients should handle streaming 
from various sources, including compressed files...

> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2015-09-15 Thread Chris Eldredge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746241#comment-14746241
 ] 

Chris Eldredge commented on SOLR-7925:
--

Sounds potentially very useful when posting large amount of data to Solr.

> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7925) Implement indexing from gzip format file

2015-09-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738274#comment-14738274
 ] 

Song Hyonwoo commented on SOLR-7925:


This patch will help to save network bandwidth when you update file to remote 
solr server.
If you need to update big file frequently to remote solr, you can update the 
file as gzipped format with this patch. 
If your system's network traffic is quite busy this patch is useful to save 
network bandwidth.

You can test it like this.
$ cd solr/core
$ ant test -Dtestcase=GZipCompressedUpdateRequestHandlerTest

Thanks.


> Implement indexing from gzip format file
> 
>
> Key: SOLR-7925
> URL: https://issues.apache.org/jira/browse/SOLR-7925
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 5.2.1
>Reporter: Song Hyonwoo
>Priority: Minor
>  Labels: patch
> Attachments: SOLR-7925.patch
>
>
> This will support the update of gzipped format file of Json, Xml and CSV.
> The request path will use "update/compress/gzip" instead of "update" with 
> "update.contentType" parameter  and  "Content-Type: application/gzip" as 
> Header field.
> The following is sample request using curl command. (use not --data but 
> --data-binary)
> curl 
> "http://localhost:8080/solr/collection1/update/compress/gzip?update.contentType=application/json=true;
>  -H 'Content-Type: application/gzip' --data-binary @data.json.gz
> To activate this function need to add following request handler information 
> to solrconfig.xml
>class="org.apache.solr.handler.CompressedUpdateRequestHandler">
> 
>   application/gzip
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org