Hey David. Is this an open/public site? If it is, can you provide -- basic/simple steps that a user has to do, to see/get what you're doing as a user using a browser?
Thanks On Fri, May 27, 2016 at 4:35 PM, David Fishburn <[email protected]> wrote: > I have been struggling with this one for quite some time, finally giving > up and asking here. > > I have a page which uses an iframe, which is totally JS created (no URLs > to create it, uses SAPUI5). > > The body, when I request the page is this: > > <body class="sapUiBody" role="application"> > <div id="ctrRoot"></div> > </body> > > > First, JS executes and creates: > > <body class="sapUiBody" role="application" style="margin: 0px;"> > <div id="ctrRoot" data-sap-ui-area="ctrRoot"> > <div id="__shell0" data-sap-ui="__shell0" class="sapDkShell > sapUiUx3Shell sapUiUx3ShellDesignStandard sapUiUx3ShellFullHeightContent > sapUiUx3ShellHeadStandard sapUiUx3ShellNoContentPadding"> > ... Lots of crap here ... > </div> > </div> > </body> > > > > Eventually, the following gets added in the ... Lots of crap here .... > section with many nested <div> tags > > <div id="demokitSplitter_secondPane" class= > "sapUiVSplitterSecondPane" style="overflow: hidden; width: 79.7396%;"> > <iframe id="content" name="content" src="about:blank" > frameborder="0" onload="sap.ui.demokit.DemokitApp.getInstance(). > onContentLoaded();" data-sap-ui-preserve="content"> > </iframe> > </div> > > > > This is the part that has the iframe. > > Eventually, the iframe is replaced with: > > <div id="demokitSplitter_secondPane" class= > "sapUiVSplitterSecondPane" style="overflow: hidden; width: 79.7396%;"> > <iframe id="content" name="content" src="about:blank" > frameborder="0" onload="sap.ui.demokit.DemokitApp.getInstance(). > onContentLoaded();" data-sap-ui-preserve="content"> > > > <html xml:lang="en" lang="en" data-highlight-query-terms="pending"> > <body> > <div id="main"> > <div id="content"> > <div class="full-description"> > </div> > <div class="summary section"> > <div class="sectionItems"> > <div class="sectionItem itemName namespace static" > > > <b class="icon" title="Analysis Path > Framework"> > <a href="test.html">test</a> > </b> > <span class="description">Analysis Path > Framework</span> > </div> > <div class="sectionItem itemName namespace static" > > > <b class="icon" title="Test2"> > <a href="test.html">test2</a> > </b> > <span class="description">Test2</span> > </div> > </div> > </div> > </div> > </div> > </body> > </html> > > > > > </iframe> > </div> > > > What I need to get access to: > <div class="sectionItems"> > > > And cycle through all these: > <div class="sectionItem itemName namespace static" > > > <div class="sectionItem itemName namespace static" > > > > > I can't seem to get my PhantomJS downloader to work. > > I have tried all the following attempts to try to wait to get that text: > > > def _response(self, _, driver, spider): > print 'PhantomJSDownloadHandler _response writing first.html, > possibly empty html (due to AJAX) %s' %(time.asctime( time.localtime(time. > time()) )) > target = codecs.open('first.html', 'w', "utf-8") > target.truncate() > target.write(driver.page_source) > target.close() > > > try: print 'PhantomJSDownloadHandler waiting for > sectionTitles %s' %(time.asctime( time.localtime(time.time()) )) > max_time_to_wait_sec = 20 > time_between_polls_milli = 2 > > > #element = WebDriverWait(driver, max_time_to_wait_sec, > time_between_polls_milli).until(EC.presence_of_element_located((By.CLASS_NAME, > "sectionItems"))) > #element = WebDriverWait(driver, > max_time_to_wait_sec).until(EC.presence_of_element_located((By.CLASS_NAME, > "sapUiVSplitterSecondPane"))) > #element = > self.driver.find_elements_by_xpath('//div[@class="sectionItems"]') > #element = self.driver.find_elements_by_xpath('//iframe') > > #WebDriverWait(self.driver,20,poll_frequency=.2).until(EC.visibility_of(element)) > > #WebDriverWait(self.driver,20,poll_frequency=.2).until(EC.frame_to_be_available_and_switch_to_it(By.id("content"))) > WebDriverWait(self.driver,20,poll_frequency=.2).until(EC. > frame_to_be_available_and_switch_to_it((By.id, "content"))) > > #WebDriverWait(self.driver,20,poll_frequency=.2).until(EC.visibility_of_element_located(By.CLASS_NAME, > "sectionItems")) > > > Some of the posts on stackoverflow talk about this: > > http://stackoverflow.com/questions/25057174/scrapy-crawl-in-order > > > > def parse(self, response): > for link in response.xpath("//article/a/@href").extract(): > yield Request(link, callback=self.parse_page, meta={'link':link}) > > > def parse_page(self, response): > for frame in response.xpath("//iframe").extract(): > item = MyItem() > item['link'] = response.meta['link'] > item['frame'] = frame > yield item > > > > > > But this looks like it is trying to fetch a link (URL) but my iframe does > it via a JS function, not a URL. > > > > Now, assuming someone can actually help me with the downloader, so it can > wait until the sectionItems div is available. > > Then in Scrapy, I need to iterate through those results. I have this code > written: > > > # Working, finds first SectionsItems > > > print 'checking for <div class="sectionItems">'sectionItems = namespace. > xpath(".//div[@class='summary section']/div[@class='sectionItems']") > #sections = hxs.xpath("//div[@class='sectionItem']") > #sections = hxs.xpath("//div[contains(@class, 'sectionItem itemName > namespace static')]") > #sections = hxs.xpath("//<div class="sectionTitle">Namespaces & > Classes</div>/div[@class='sectionItems']") > > > print 'xpath SectionItems:%s' %sectionItems > > > for sectionItem in sectionItems: > print 'Found SectionItem:' > #sections = sectionItem.xpath("div[@class='sectionItem']") > sections = sectionItem.xpath("div[re:test(@class, 'sectionItem')]") > #sections = sectionItem.xpath("div[re:test(@class, 'sectionItem > itemName namespace static')]") > > > for section in sections: > print 'Found Section:%s' %(section.extract()) > > > > > Any help is greatly appreciated. > David > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
