Likely a good http debugger would help (wireshark, or fiddler2, for example) http://www.telerik.com/fiddler https://www.wireshark.org/download.html For example, it could show the http header that the "client" uses to request info from an api, then the show results of that query. One small caveat: I have not tried this with "standalone" server or with any SOLR type project. Cheers!Steve
> From: teag...@insystechinc.com > To: solr-user@lucene.apache.org > Subject: RE: Tika HTTP 400 Errors with DIH > Date: Fri, 5 Dec 2014 12:03:23 -0500 > > Alex, > > Your suggestion might be a solution, but the issue isn't that the resource > isn't found. Like Walter said 400 is a "bad request" which makes me wonder, > what is the DIH/Tika doing when trying to access the documents? What is the > "request" that is bad? Is there any other way to suss this out? Placing a > network monitor in this case would be on the extreme end of difficult. > > I know that the URL stored is good and that the resource exists by copying it > out of a Solr query and pasting it into the browser, so that eliminates 404 > and 500 errors. Is the format of the URL correct? Is there some other setting > I've missed? > > I appreciate the suggestions! > > -Teague > > > -----Original Message----- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Thursday, December 04, 2014 12:22 PM > To: solr-user > Subject: Re: Tika HTTP 400 Errors with DIH > > Right. Resource not found (on server). > > The end result is the same. If it works in the browser but not from the > application than either not the same URL is being requested or - somehow - > not even the same server. > > The solution (watching network traffic) is still the same, right? > > Regards, > Alex. > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and > newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers > community: https://www.linkedin.com/groups?gid=6713853 > > > On 4 December 2014 at 11:51, Walter Underwood <wun...@wunderwood.org> wrote: > > No, 400 should mean that the request was bad. When the server fails, that > > is a 500. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ > > > > > > On Dec 4, 2014, at 8:43 AM, Alexandre Rafalovitch <arafa...@gmail.com> > > wrote: > > > >> 400 error means something wrong on the server (resource not found). > >> So, it would be useful to see what URL is actually being requested. > >> > >> Can you run some sort of network tracer to see the actual network > >> request (dtrace, Wireshark, etc)? That will dissect the problem into > >> half for you. > >> > >> Regards, > >> Alex. > >> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources > >> and newsletter: http://www.solr-start.com/ and @solrstart Solr > >> popularizers community: https://www.linkedin.com/groups?gid=6713853 > >> > >> > >> On 4 December 2014 at 09:42, Teague James <teag...@insystechinc.com> wrote: > >>> The database stores the URL as a CLOB. Querying Solr shows that the field > >>> value is "http://www.someaddress.com/documents/document1.docx" > >>> The URL works if I copy and paste it to the browser, but Tika gets a 400 > >>> error. > >>> > >>> Any ideas? > >>> > >>> Thanks! > >>> -Teague > >>> -----Original Message----- > >>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > >>> Sent: Tuesday, December 02, 2014 1:45 PM > >>> To: solr-user > >>> Subject: Re: Tika HTTP 400 Errors with DIH > >>> > >>> On 2 December 2014 at 13:19, Teague James <teag...@insystechinc.com> > >>> wrote: > >>>> clob="true" > >>> > >>> What does ClobTransformer is doing on the DownloadURL field? Is it > >>> possible it is corrupting the value somehow? > >>> > >>> Regards, > >>> Alex. > >>> > >>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources > >>> and newsletter: http://www.solr-start.com/ and @solrstart Solr > >>> popularizers community: https://www.linkedin.com/groups?gid=6713853 > >>> > > >