*This is calling a script from the command line with scrapy crawl:*


' <p class="indent">The petitioner brought this suit for '

'damages under the Jones Act,<a class="footnote" href="#fn1" '

'id="fn1_ref">1</a> alleging that her husband while employed by '

'the respondent railroad as a tug fireman was drowned because of '

'the negligent failure of respondent to provide him with a safe '

'place to work. The District Judge directed the jury to return a '


*Here is the same text in the shell with response.text: *


In [2]: response.text

\n <p class="date">Argued March 27, 28, 1956.</p>\n <p class="date">Decided 
April 9, 1956.</p>\n <div class="prelims">\n <p class="indent">Mr. Nathan 
Baker, New York City, for petitioner.</p>\n <p class="indent">Mr. Joseph P. 
Allen, New York City, for respondent.</p>\n <p class="indent">Mr. Justice 
BLACK delivered the opinion of the Court.</p>\n </div>\n <div class="num" 
id="p1">\n <span class="num">1</span>\n <p class="indent">The petitioner 
brought this suit for damages under the Jones Act,<a class="footnote" 
href="#fn1" id="fn1_ref">1</a> alleging that her husband while employed by 
the respondent railroad as a tug fireman was drowned because of the 
negligent failure of respondent to provide him with a safe place to work. 
The District Judge directed the jury to 


*And the same text again using fetch:*


<p class="date">Argued March 27, 28, 1956.</p>

<p class="date">Decided April 9, 1956.</p>

<div class="prelims">

<p class="indent">Mr. Nathan Baker, New York City, for petitioner.</p>

<p class="indent">Mr. Joseph P. Allen, New York City, for respondent.</p>

<p class="indent">Mr. Justice BLACK delivered the opinion of the Court.</p>

</div>

<div class="num" id="p1">

<span class="num">1</span>

<p class="indent">The petitioner brought this suit for damages under the 
Jones Act,<a class="footnote" href="#fn1" id="fn1_ref">1</a> alleging that 
her husband while employed by the respondent railroad as a tug fireman was 
drowned because of the negligent failure of respondent to provide him with 
a safe place to work. The District Judge directed the jury to 


<<>>


The first result is using my spider, which omits all the preliminary stuff. 
Note how it is squished in, (on both sides, but the left didn't copy over) 
as if it was in a narrow column. It has newlines only at the end of a 
paragraph, but it looks like each literal line is itself a string, with 
single quotes around all of them. It also has items, if that makes a 
difference.


The second is response.text. It has newlines but appears to be a single 
unformatted string. Note how it goes all the way to both margins.


The third is using fetch. Now it "respects" the html tags/layout, and has 
no newlines. It also goes margin to margin.


But neither the 2nd nor 3rd examples use my spider, even when I tried the 
--spider=myspider flag. That's why all the preliminary stuff is in there. 


*And here is a copy paste of the same text from the original, which also 
goes margin to margin:*


The petitioner brought this suit for damages under the Jones Act,1 alleging 
that her husband while employed by the respondent railroad as a tug fireman 
was drowned because of the negligent failure of respondent to provide him 
with a safe place to work. The District Judge directed the jury to return a 


*So, my question is, why am I getting this difference and how do I take 
control of it?*

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to