[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273002#comment-14273002
 ] 

Erik Hatcher commented on SOLR-6959:


The url to post files to is determined on a per-file basis, which could be a 
directory of files where .xml files go to /update and .pdf files go to 
/update/extract.   The logging message does qualify that it is the base URL.

Would you want the URL logged for *every* file?

 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 5.0
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools

 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273012#comment-14273012
 ] 

Alexandre Rafalovitch commented on SOLR-6959:
-

This is a very interesting and educational question. The fact that the 
*/update* is a *base* is not well explained anywhere. I just run the test
{quote}
java -Durl=http://localhost:8983/solr/techproducts/update2 -Dauto -jar post.jar 
*
{quote}

And it did do *POST /solr/techproducts/update2/extract* for the PDF file. Not 
what I expected somehow.

My main concern is reducing the magic through a better message. If somebody 
posted a file and something unexpected happened, they would troubleshoot it by 
following the _request handler_ and it's parameters as one of the steps. But we 
don't tell them here which request handler it is. We give only one piece of 
information here that just happen to also be a valid _request handler_.

They could pick that information up from the log file I guess if they had 
access to it and knew what to look for. But it would be easier if the tool was 
more clear about it, as it does not know exactly what happened.

What if we add something like this to the message:
{quote}
POSTing file books.csv (text/csv) to \[base]
POSTing file solr-word.pdf (application/pdf) to \[base]/extract
{quote}

Where the word \[base] is just that - the word.

This could also clarify a bit the situation with the fact that XML, CSV, and 
JSON go to the same handler, yet we have - slightly confusingly - request 
handlers for both CSV and JSON in the solrconfig.xml.

The help message for the tool needs to be improved as well. It says 
*solr-update-url* and nothing about base and suffixes.

 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 5.0
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools

 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272996#comment-14272996
 ] 

Alexandre Rafalovitch commented on SOLR-6959:
-

Also, at least the parameters passed with -Dparams are shown in that log 
message. The PDF code adds some parameters internally (like literal.id). Should 
they be shown as well? They are very long though (full file path).

 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 5.0
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools

 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273114#comment-14273114
 ] 

ASF subversion and git services commented on SOLR-6959:
---

Commit 1651016 from [~ehatcher] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1651016 ]

SOLR-6959: Elaborate on URLs being POSTed to (merged from trunk r1651013)

 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 5.0
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools

 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273176#comment-14273176
 ] 

Alexandre Rafalovitch commented on SOLR-6959:
-

This output is in my book's current draft. You bet I don't want to explain why 
two different invocations do different things. Unless they actually do 
different things. :-)

 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools
 Fix For: 5.0, Trunk


 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273171#comment-14273171
 ] 

ASF subversion and git services commented on SOLR-6959:
---

Commit 1651028 from [~ehatcher] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1651028 ]

SOLR-6959: standardize XML content-type (merged from trunk r1651027)

 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools
 Fix For: 5.0, Trunk


 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273170#comment-14273170
 ] 

ASF subversion and git services commented on SOLR-6959:
---

Commit 1651027 from [~ehatcher] in branch 'dev/trunk'
[ https://svn.apache.org/r1651027 ]

SOLR-6959: standardize XML content-type

 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools
 Fix For: 5.0, Trunk


 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273113#comment-14273113
 ] 

ASF subversion and git services commented on SOLR-6959:
---

Commit 1651015 from [~ehatcher] in branch 'dev/trunk'
[ https://svn.apache.org/r1651015 ]

SOLR-6959: Elaborate on URLs being POSTed to

 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 5.0
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools

 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273119#comment-14273119
 ] 

Erik Hatcher commented on SOLR-6959:


bq. This could also clarify a bit the situation with the fact that XML, CSV, 
and JSON go to the same handler, yet we have - slightly confusingly - request 
handlers for both CSV and JSON in the solrconfig.xml

Well, if someone is using post.jar, chances are he/she isn't aware of the 
additional handlers that you mention so there wouldn't be any confusion I don't 
think.  Those handlers are just there for backwards compatibility (or for 
aesthetics, if one likes to post to, say, /update/csv).   I don't think we need 
to do anything different here.

 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 5.0
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools

 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273133#comment-14273133
 ] 

Alexandre Rafalovitch commented on SOLR-6959:
-

Looks good. Except this now uncovers a little wrinkle:
{quote}
$ java -Dc=techproducts -jar post.jar hd.xml
SimplePostTool version 1.5
Posting files to \[base] url http://localhost:8983/solr/techproducts/update 
using content-type application/xml...
POSTing file hd.xml to \[base]
{quote}

vs.

{quote}
$ java -Dc=techproducts -Dauto -jar post.jar hd.xml
SimplePostTool version 1.5
Posting files to \[base] url http://localhost:8983/solr/techproducts/update...
Entering auto mode. File endings considered are 
xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file hd.xml (text/xml) to \[base]
{quote}

Is there a reason we are using different content types for the same XML file 
with and without *-Dauto*?


 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools
 Fix For: 5.0, Trunk


 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273121#comment-14273121
 ] 

Alexandre Rafalovitch commented on SOLR-6959:
-

Actually, these days, these two handlers are commented out in the source code 
and are instead hard-coded as an implicit handler.  Causing confusion of their 
own (SOLR-6938). FWIW.

 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools
 Fix For: 5.0, Trunk


 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6959) SimplePostTool reports incorrect base url for PDFs

2015-01-11 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273174#comment-14273174
 ] 

Erik Hatcher commented on SOLR-6959:


bq. Except this now uncovers a little wrinkle...

ok, ok!  :)  dang you're thorough, and thanks for that seriously.  aligned to 
application/xml.  no (good) reason they were different.

 SimplePostTool reports incorrect base url for PDFs
 --

 Key: SOLR-6959
 URL: https://issues.apache.org/jira/browse/SOLR-6959
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
Reporter: Alexandre Rafalovitch
Assignee: Erik Hatcher
Priority: Minor
  Labels: tools
 Fix For: 5.0, Trunk


 {quote}
 $ java -Dc=techproducts -Dauto -Dcommit=no -jar post.jar solr-word.pdf
 SimplePostTool version 1.5
 Posting files to base url http://localhost:8983/solr/techproducts/update..
 {quote}
 This command will *not* post to */update*, it will post to */update/extract*. 
 This should be reported correspondingly.
 From the server log:
 {quote}
 127.0.0.1 -  -  \[11/Jan/2015:17:17:10 +] POST 
 /solr/techproducts/update/extract?resource.name=
 {quote}
 It would make sense for that message to be after the auto-mode determination 
 just before the actual POST.
 Also, what's with two dots after the url? If it is _etc_, it should probably 
 be three dots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org